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We present a new distributed algorithm for state space minimization modulo branching bisimulation. 
Like its predecessor it uses signatures for refinement, but the refinement process and the signatures 
have been optimized to exploit the fact that the input graph contains no T-loops. 

The optimization in the refinement process is meant to reduce both the number of iterations 
needed and the memory requirements. In the former case we cannot prove that there is an improve- 
ment, but our experiments show that in many cases the number of iterations is smaller. In the latter 
case, we can prove that the worst case memory use of the new algorithm is linear in the size of the 
state space, whereas the old algorithm has a quadratic upper bound. 

The paper includes a proof of correctness of the new algorithm and the results of a number of 
experiments that compare the performance of the old and the new algorithms. 



1 Introduction 

The idea of distributed model checking of very large systems, is to store the state space in the collective 
memory of a cluster of workstations, and employ parallel algorithms to analyze the graph. One approach 
is to generate the graph in a distributed way, and on-the-fly (i.e. during generation) run a distributed 
model checking algorithm. This is what is done in the DiVinE toolset Q. This is useful if the system is 
expected to contain bugs, because the generation can stop after finding the first bug. 

Another approach is to generate the full state space in a distributed way, and subsequently run a 
distributed bisimulation reduction algorithm. The result is usually much smaller, and satisfies the same 
temporal logic properties. The minimized graph could be small enough to analyse with sequential model 
checkers. This approach is useful for certification, because many properties can be checked on the 
minimized graph. This paper contributes to the second approach. 

The process-algebraic way of abstracting from actions is to hide them by renaming them to the 
invisible action X. To reason about equivalence of these abstracted models, branching bisimulation |[T3l 
5 ] can be used. Because branching bisimulation is coarser than strong bisimulation, this leads to smaller 
state spaces modulo reduction. 

Distributed minimization algorithms have been proposed in SOU f° r strong bisimulation, and in |[8] 
for branching bisimulation. These are signature-based algorithms, which work by successively refining 
the trivial partition, according to the (local) signature of states with respect to the previous partition. 

The best-known sequential algorithm Ifl5l for branching bisimulation reduction assumes that the 
state space has no T-cycles. The idea is that any T-cycles can be removed in linear time, by Tarjan's algo- 
rithm to detect (and eliminate) strongly connected components (SCC) ET1 . Eliminating SCCs preserves 
branching bisimulation. 
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Because eliminating T-cycles in distributed graphs seemed complicated, the algorithm in [8] works 
on any LTS, i.e. it doesn't assume the absence of T-cycles. This generality came with a certain cost: 
signatures have to be transported over the transitive closure of silent T-steps. [^For some cases this leads 
to increased time and memory usage. 

Later, several distributed SCC detection (and elimination) algorithms have been developed ||T9l [T8l 
[T6j [3j . It has already been reported in |[T8l that running SCC elimination as a preprocessing step to the 
branching minimization algorithm of [8], reduces the overall time. Note that this gain was achieved even 
though the minimization algorithm doesn't assume that the input graph is T-acyclic. 

In this paper, we further improve this method, by exploiting the fact that the input graph of the 
minimization algorithm has no T-cycles. Using this extra knowledge, we are able to develop a distributed 
minimization algorithm that runs in less time and memory. 

At the heart of our improved method is a notion of inductive signature. Normally, during a round 
of signature computations, only the signatures of the previous round may be used. The basic idea of 
inductive signatures is that the new signature of a state may depend on the current signature of its 
successors, provided a is guaranteed to terminate. We will first illustrate this notion for strong bisimula- 
tion, and then apply it to branching bisimulation, where T is cycle-free, i.e. is a terminating transition. 
Note that if all action labels are terminating, the graph is actually a directed acyclic graph, for which it is 
known that there is a linear algorithm for bisimulation reduction. 

Overview. In the next section, we will explain the theory and prove the correctness of the improved 
signature bisimulation. In section |3} we explain how we turned the definition of inductive signature 
bisimulation onto a distributed algorithm and how we implemented it on top of the LTSmin toolsej^] We 
show the results of running the tool on several problems in Section [4j 

2 Theory 

In this section, we start by recalling the basic definitions of LTS and bisimulation. Followed by the 
definitions of signature refinement from previous papers. Then we present inductive signatures for strong 
bisimulation followed by inductive signatures for branching bisimulation. We end this section with the 
correctness proof for branching bisimulation. 

2.1 Preliminaries 

First, we fix a notation for labeled transition systems and recall the definitions of strong bisimulation and 
branching bisimulation lfT3l l5ll. Our transition systems are labeled with actions from a given set Act. The 
invisible action T is a member of Act. 

Definition 1 (LTS) A labeled transition system (LTS) is a triple (S,— >,s°), consisting of a set of states 
S, transitions -^C5x Act x S and an initial state s° e S. 

We write s -A t for (s,a,t) e— >, and use to denote the transitive reflexive closure of 
Both strong and branching bisimulation can be defined in two ways. As a relation between two LTSs 
or as a relation on one LTS. We choose the latter. 

Definition 2 (strong bisimulation) Given an LTS (S, — s ). A symmetric relation R C 5 x 5 is a strong 
bisimulation if\/s,t,s' e S : Mae Act : s R t A s A s 1 => 3t' e S : t -^>t' As' Rt'. 

1 A T-step is silent if the source and destination are equivalent (with respect to the previous partition). 
~http : //fmt . cs.utwente.nl/tools/ltsmin/ 
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Definition 3 (branching bisimulation) Given an LTS (S, —*,s°). A symmetric relation R C S x S is a 
branching bisimulation if 

\/s,t,s'eS: VaeAct: s R t As A s' => (a = X As' R t) V {3t' ,t" e S : t t' As R t' At' -2* t" As' R t") 

Two states s,t e S are branching bisimilar (denoted s ±±t) if there exists a branching bisimulation R 
such that sRt. 

For proving correctness, we will use a few properties: 

Proposition 4 Given an LTS: 

• the relation ±± is a branching bisimulation; 

• if R is a branching bisimulation then R Q±±. 
For a proof see |[T3Tl . 

To talk about bisimulation reduction algorithms, we need the terminology of partition refinement. 
Given a set 5. 

• A set of sets {Si , • • • , S^} is a partition of S if S = Si U • • • U Sjv and V/ / j : S; n Sj = 0. Each set S; 
is referred to as a block and must be non-empty. 

• A partition {Si , • • • , S^} is a refinement of a partition {S\ , • • • , S' M } if V/3 j : S,- C S'-. 

• Any partition {Si, • • • ,Sjv} can be represented with an identity function ID : S — > N, defined as 
7D(j) = i, ifse S,-. 

2.2 Signature Refinement 

We continue with the previously published valiant of signature refinement. Because many results are 
correct for finite LTSs only, we assume that both Act and all LTSs are finite for the remainder of the 
paper. 

The signature of a state is computed with respect to a partition. Intuitively, the signature of a state is 
the set of possible moves (actions) that are possible in a state with respect to the partition (represented 
by a number). Formally: 

Definition 5 

• The set of signatures Sig is the set of finite subsets of Act x N. 

• A partition % of an LTS (S, — >,s°) is a function % : S — > N. 

• A signature function is a function sig : (S — > N) x S — » Sig, smc/z that for all isomorphisms (f> : N — > N 
awJ a// partitions K: 

Vs e S : o 71, 5) = {(a, 0(n)) | («,«) e sig(n,s)} 

The last clause is to ensure that the equality on signatures is independent of how numbers are chosen 
to represent partitions. This is important because we want to do a refinement process, where based on 
a partition, we compute signatures, which we turn into a partition, for which we compute signatures, 
etc. until the partition is stable. This requires translating signatures (or better pairs of previous partition 
numbers and signatures) to integers, which we do by means of given isomorphisms: 

fti > A 2 ,---:NxSig->N . 

These isomorphisms exist due to the fact that signatures are finite, which implies that the set of signatures 
is countable. The actual refinement process works as follows: 
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• Given an initial partition Kq of 5. 

• Given a signature function sig. 

• Define K i+1 (s) = h i+l (7Ci(s),sig(7Ci,s)) 

• Define the relation %\ C 5 x S as s %\ t, if 7T;(j) = 7q(t) . 

• There exists iVeN such that the relation = 7In + \. Define n'^ 8 = %fj. 

Note that although the definitions of the functions 7E/+i depend on the choice of the isomorphisms 
hi + \, the relations %\ will be the same regardless of the choice of hj + \, due to the third clause of Defini- 
tion [5] This definition is turned into an algorithm by starting with 71, for i = 0, and computing 7T, + i from 
%i until the partition is stable (7T !+ i = jq). 

For the computed refinement to make sense, we need notions of signatures that correspond to mean- 
ingful equivalences. For example, the signatures of a state according to strong bisimulation and branching 
bisimulation are 

Definition 6 (classic signatures) 

sig s (7T,s) = {(a,n(t)) | s-^t} 

sig b (7r,s) = {(a,n(t)) \ s ^ si ■ ■ ■ ^ s n t,n(s) = n(si) A(a ^ zy n(s) ^ n(t))} 

The signature of a state says which equivalence classes are reachable from the state by performing 
an action. For example in strong bisimulation, if there is an a step from a state s to a state t then the 
equivalence class of t is reachable by means of an a step form s which is expressed by putting the pair 
(a, 7t(t)) in the signature of s. 

The case for branching bisimulation is more complicated. The set of actions includes the invisible 
action x. The intent of this label is that whatever happens is unimportant. Thus T steps are ignored, 
except if they change the branching behaviour. An ignored % step is called silent. More formally a z 
step is silent with respect to a partition if it is between states in the same equivalence class. 

See and (8 ] for more explanation. 

2.3 Inductive signatures for strong bisimulation 

In the classical definition of the strong bisimulation signature, the signatures depend on the previous 
partition only. One may wonder if in some cases the current partition can be used. The answer is yes. If 
for each label you consistently use the old partition or consistently use the new partition then it still works. 
Of course if we use the current partition then we must ensure that all signatures are well defined. This is 
ensured if the subgraph of edges for which we use the current partition is acyclic. This is guaranteed if 
we have a well-founded partition of the set of actions. A well-founded partition is a partition A?,A> of 
the set of actions, such that the relation {(s,t) \ s A/Aae A>} is well-founded: 

Definition 7 A pair (A?,A>) is a well founded partition of Act for an LTS (5, — >,j ) j/A?nA> = 0, 
A? U A> = Act and the LTS is A> cycle free. The order >C S x S is defined by >= (u ae A> ) + - 

Based on the well-founded order > we can give inductive definitions and proofs. For example, we 
can define inductive strong bisimulation signatures: 

Definition 8 (inductive strong bisimulation) Given an LTS (S, — >, s°), a well founded partition (A?, A>) 
for it, an initial partition function 7Iq : S — > N and isomorphisms hi,Ii2,--- : N x Sig — * N. Define 

sigi + i(s) = {(a, 7ti(t)) | s-^*t AaeA?}U{(a, m+i(t)) \ s -^»f AaeA>} 
7T,- + i0) = h i+ i(7lj(s),sig i+l (s)) 
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Note that sigi + \(s) is defined inductively in terms of any 71,-values, and only 7T,+i values of states 
that are smaller in >. To show how the definition works and how the choice of the partition influences 
performance, we continue with an example. 

Example 9 Consider the following ITS: 



a a a a a 




If we take A> := {a}, and set 71q(s) := Ofor all states, we get the following run: 



sigi(5) 


= {(*,<>)} 




7n(5) 


= 1 


sigi(4) 


= {(b,0),(a 


1)} 


7n(4) 


= 2 


sigi(3) 


= {(a,2)} 




7n(3) 


= 3 


sigi(2) 


= {(b,0),(a 


3)} 


lti(2) 


= 4 


«'gi(l) 


= {(b,0),(a 


4)} 


Wi(l) 


= 5 


Mgl(O) 


= {(a,5)} 




wi(0) 


= 6 



Afofe f/iaf every state got a different signature, so in this case we reach the final partition in one round. 
Also note that the order of computation was completely fixed, because the label a imposes a total order 
on the states. 

Next, consider the same example, but let Ay = {b}. Note that this is also terminating. Again, we take 
7Cq(s) = Ofor any state s. 



Mgl(O) 


= {M)} 


*i(0) = 


1 , 


«gi(3) 


= {M)} 


7n(3) 


= 1 




= {M),(M)} 


wi(l) = 


2 


«Si(4) 


= {(«,0),(M)} 


7n(4) 


= 2 


«Si (2) 


= {(a,0),(*,2)} 


Wi(2) = 


3 


«gi(5) 


= {(6,2)} 


7Tl(5) 


= 4 


sig 2 (0) 


= {(«,2)} 


^(0) = 


5 , 


«g2(3) 


= {(«,2)} 


^(3) 


= 5 


«£2(1) 


= {(a,3),(6,5)} 


n 2 {\) = 


6 


«g2(4) 


= {(a,4),(6,5)} 


n 2 (4) 


= 7 


sig2(2) 


= {( fl ,l),(fe,6)} 


n 2 (2) = 


8 , 


«g2(5) 


= {(6,7)} 


MS) 


= 9 


sig 3 (0) 


= {(«,6)} 


7T 3 (0) = 


10 , 


«g3(3) 


= {(«,7)} 


7T 3 (3) 


= 11 




= {(«,8),(fe,10)} 


7T 3 (1) = 


12 , 


sig 3 (4) 


= {(a,9),(fe,ll)} 


7T 3 (4) 


= 13 


«g3(2) 


= {(a,5),(6,12)} 


7T 3 (2) = 


14 , 


«g3(5) 


= {(^13)} 


^3(5) 


= 15 



Note that this time we need three iterations, but there is some room for parallel computation, because 
the signature ofO and 3 can be computed independently, because they have no b successors. 

2.4 Inductive signatures for branching bisimulation 

In the splitting procedure of the Groote-Vaandrager algorithm, whenever a state has one or more x suc- 
cessors inside the block that is being split, the algorithm tests if the behavior of one of those x successors 
includes all of the behavior of the state. If such a successor exists, then the state is put in the same block 
as that successor. Because of this splitting procedure the graph has to be T-cycle free. A similar effect 
can be achieved by exploiting x cycle freeness when we define the branching signature. Thus, we assume 
that x s A> for all partitions (A?,A>). 

The inductive branching signature is computed in two steps. First, the /^re-signature is computed, 
which consists of all transitions to all successors, including T-steps to possibly equivalent states. Second, 
we look for a T-successor in the same block of the previous partition which contains all pre behavior 
except the x step to that successor. If such a successor is found then the signature is the signature of that 
successor, otherwise the signature is the ^re-signature: 
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Definition 10 (inductive branching bisimulation) Given an LTS (S,—*,s°), a well founded partition 
(A?,A>) for it with X e A> and an initial partition function 71q : S — > N. Define 

prei + \[s) = {(a,7ti(t)) \ s t Aa e A?} L){(a,7r i _)_i(/)) | s t Aa e A>} 

sigi + i(s) = if there exists a t with s t, Ki(s) = 7T,(?) and prei + \{s) C sigi + \[t) U{(t, 7r,-+i(?))} 

then sigi+\(t) 

else prei + \(s) 
7Ci+i(s) = h i+ i(7li(s),sigi + i(s)) 

It is not immediately obvious that this is well-defined: what if there exists more than one T-successor 
that passes the test? The answer is: then they have the same signature. We prove this in lock step with 
the observation that if a signature a contains a pair (a,n), then any state with signature a has a path of 
silent X steps to a state where an a step is possible to a final state in partition n. 

To avoid unnecessary case distinctions between a e A? and a e A>, we introduce the notation 

„ def J , if a e A? 
\ 1 ,ifaeA> 

This allows us to abbreviate u 7ti(s) if a e A? and 7T, + i(j) if a e A>" by 7li+&(s). Due to space restrictions, 
we only sketch the essentials of the proofs. Full proofs can be found in (6). 

Proposition 11 For all states s: 

1. If there exist t\,ti with s -^>t\, s £2, = ftf(*i) = Ki(h)> P r ei+\{s) C (/i)U{(t, 7T,- + i(fi))} 
andpre i+ i(s) C «g m (f 2 ) U{(T,^+i(t 2 ))} = «g I+ i(* 2 ). 

2. If{a,n) e sig i+l (s) then 3s\, ■ ■■ , s m ,t : s si ^ J m -^-f A^-(j) = 7T,(s 7 ) An = % i+ a{t). 

Proof. We prove both parts at once by induction on — ► . 

Given a state s, we prove part 1 by contradiction. Suppose that sigi+\ {t\) / sigi+ifa). 

By definition {(T,^ + i(fi)),(T,^ +1 (/ 2 ))} C pre i+l (s). This implies that (t,^+i(/i)) e sigi+ife) and 

(t, Tfy+i fa)) e w'gj+i (?i ) ■ By using part 2, we construct an infinite path s t\ = s\ s\ 52 

with 7r, + i(jj) = 7T,+i(fi) and 7T; + i(s-) = 7ti+ifa). This infinite path contradicts the cycle freeness. 

The proof of part 2 is elementary. □ 

We will show how the new definition works and is different from the approach of [8], by means of 
an example: 

Example 12 Consider the following three examples. We have only drawn the nodes of the graphs which 
are relevant. Let 7Cq(s) = Ofor all s and Kj(s) = Ofor all nodes s which have been omitted. 




Let A> = {x}. Then for the left-most LTS on the left, we get: 



pre 1 (2) 


= {(a 


0),(6,0),(c 


0)} 












sigi (2) 


= {(a 


0),(6,0),(c 


0)} 


wi(2) 


= 1 








prei(l) 


= {(a 


0),(6,0),(t 


1)} 


Note : 


{(a 


0),(^,0),(t, 


l)}C{(a,0),(fe,0) ) (c 


0),(t,1)} 




= {(« 


0),(6,0),(c 


0)} 


7n(l) 


= 1 








pre i(0) 


= {(« 


0),(t,1)} 




Afofe : 


{(a 


0),(T,1)}C 


{(a,0),(6,0),(c,0),(T 


1)} 


si gl (0) 


= {(a 


0),(6,0),(c 


0)} 


wi(0) 


= 1 
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Note that \dom(sig\)\ = \dom(sigo)\ = 1, so sig\ is stable, and all T-steps are silent. 
For the middle LTS, we obtain: 



pre\{2) 


■= {(« 


0),(fe,0),(c,0)} 












sigi(2) 


:= {(a 


o),(fc,o),M)} 


Wi(2) 


= 1 








pre\(\) 


:= {(a 


0),(b,0)} 














■= {(a 


0),(b,0)} 


wi(l) 


= 2 








prei (0) 


■= {(a 


0),(t,1),(t,2)} 


Note 


{(a 


,0),(T 


l),(T,2)}g{(a 


,0),(6,0),(c,0),(t,1)}, 










{(a, 


0),(T, 


l),(T,2)}g{(a, 


0),(6,0),(t,2)} 


si gl (0) 


:= {(a 


0),(t,1),(t,2)} 


wi(0) 


= 3 









Note ?/iaf |<iom(jigi)| = 3, which cannot increase, so again sigy is stable. In this case, none of the T-steps 
is silent. 

For the LTS on the right, we get 

si gl (2) := {(a,0)} 7Fi(2) = l , si gl (3) := {(b,0)} *i(3) = 2 

si gl (l) := {(t,1),(t,2)} tti(1)=3 , si gl (0) := {(a,0), (6,0), (t,3)} ^(0) = 4 

Already after one iteration it is detected that none of the T-steps is silent. In the original definition in /HI/, 
f/u's would be detected later, as the following example shows. 

sigbi(2):={(a,0)} ^(2) = 1 , «^i(3):={(6,0)} Wi(3) = 2 

s/g Ol (l):={M),(o,0)} g(lH , «gfei(0):={(a,0),(ft,0)} fti(0) = 3 

«^ 2 (2):={(a,0)} tt 2 (2) = 1 , ^ 2 (3):={(fe,0)} ?r 2 (3) = 2 

^ 2 (1):={(t,1),(t,2)} n 2 (l) =4 , ^ 2 (0):={(a,0),(6,0),(T,l),(T,2)} 7T 2 (0) = 5 

Note the two differences between inductive and classic signatures. First, the fact that 1 is not silent 
is detected in the first iteration by inductive and the second by classic signatures. Second, in the inductive 
case the size of the signature is limited by the number of outgoing transitions in the classic case it is not. 



2.5 Correctness 

We use the same proof technique as in previous work. That is, we prove that bisimilar states are always 
in the same block and that if a %\ partition is stable {%[ and 7T, + i denote the same relation) then 7ij is 
a bisimulation. Thus because ±± is the coarsest bisimulation, we must have that %\ coincides with ±±. 
Again, we include proof sketches only. Full proofs are available in 

In this section we work on a given LTS (S, —*,s ) and well-founded partition (A?,A>), with T e A > . 
We consider inductive branching bisimulation and we let s ±± t t denote 7T/(j) = 7Ci(t). 

One of the properties of a T-cycle free LTS is that given a state one can always follow T steps to 
bisimilar states, until a state is found that has no such step. These states are called canonical: 

Definition 13 A state s is canonical (denoted s[) if~<3s' : s^/Am/ 

Canonical states have the important property that all visible behavior is present as an immediate step 
rather than as a sequence of one or more invisible steps followed by a visible step. 

Lemma 14 If±± C ^ then for all states s, t we have (s ±±t At[) ±±i + \ t 

To prove this, we need two properties. 
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Proposition 15 For all states s,t, we have 

1. pre i+ \{s) C sig i+1 (s)\J{(r,n i+1 (s))}. 

2. pre i+ \{s) C sig i+l (s)U {(r,n i+ i(s))}. 

Proof. By distinguishing cases depending on which branch was taken in the if-then-else of the definition 
of inductive signature. □ 



Proof of Lemma 14 By induction on the order (s, t) > (s',t') iff s > s* At > t'. 

Because any transition in s is either matched by a transition of t, or it is a silent z step, we have 

pre i+l {s) C pre i+l {t) U{(t, 7l i+i {t))} 
Now, we distinguish on whether s is canonical or not. 

• s J.: In this case prej + \ (s) = pret + \ (t), due to the fact that bisimilar canonical states have the same 
transitions. This implies sigi + \(s) = sigi + \(t) and thus s±±j+i t. 

• s s' A s ±± s': By induction hypothesis sigi + \(s') = sigi + \ (?) . Thus 

pre i+1 (s) C pre i+ i(t)u{(r,n i+1 (t))} c sig i+1 (t)U{(r,n i+1 (t))} = signs') U{(r,n i+1 (s'))} 
and therefore sigi + \(s) = sigt + i(s'). 

□ 

Lemma 16 If for all s,t: s ±±j t 4^ s ±±j + \ t then ±± t is a branching bisimulation. 

Proof. Corollary of PropfTTj part 2. □ 



3 Distributed Algorithm 

In this section, we present a distributed algorithm for computing the branching bisimulation equivalence 
relation. 

The input to the algorithm is an LTS (S,— >,s Q ), a well founded partition (A?,A>), and a function 
owner : 5 — > {1, • ■ ■ , W} where W is the number of workers. The owner function is a given distribution 
of states among the workers. 

The given isomorphisms of the theory are replaced by global hash tables in the implementation. 
Each worker stores an equal part of this global hash table.The worker where the (new) ID of the pair 
(oldID,signature) is stored is given by the second owner function owner : ID x Sig — ► {1 , • • • , W}. 

In the actual implementation states and edges are numbered entities. Since the theory assumes that 
edges are triples, we need to introduce some new notation. Moreover, we have to distinguish which 
worker owns which state and which edge, so we need some notation for that as well. 

The functions src,dst and Ibl provide access to the source state, destination state and label of an 
edge, respectively: 

Ve = (s,a,t) e — > : src(e) = s, lbl[e) = a and dst{e) = t . 
Each worker owns a set of states and needs to know the outgoing T edges, A? edges and A> edges: 
S w = {s e S \ owner (s) = w} = {e e — > | src{e) e S w Albl(e) = t} 

El = {e e — > | src(e) e S w A lbl(e) e A?} E> = {e e —> \ src(e) e S w A lbl(e) e A>} 

Finally, we need the definitions of successor and predecessor edges of a state: 

succ(s) = {e | src(e) = s} pred(s) = {e \ dst(e) = s} 
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Table 1 : Pseudo code for worker w (inductive branching bisimulation reduction) 

1 set sig [S w ] , d est _s i g [£■*,] , old_queue, sig_queue, new .queue 

2 int old_id [S w ] , cu rrentJd [S w ] , d st _o I d [E? UE% ] , dst_new[£>] 

3 proc reduce ( ) 



4 int old_count: = , new_count: = l 

5 for t e S w do c u r re n t _ i d [ t ] : = end 

6 while o I d _co u n t ^ new_cou n t do 

7 o I d _co u n t :— new_cou n t ; i n d exe d _se t _c I e a r ( ) 

8 for teS w do old-id [f]:=current_id[f]; current-id [f]: = _L end 

9 for e in E^ do d st _o I d [e] : — ± end ; for e in E> do dst.new [e] : = _L end 
10 for e in Ef v do d st _s i g [e] : = _L ; d st _o I d [e] : = _L end 

n old_queue := S w ; sig_queue:= {s e S w \ ->3a,t : s t} ; new_queue := 

12 do 

13 : take j from old_queue => 

14 for e in pred(s) with lbl(e) eAcf?U{T} do 

15 send s e t _o I d ( e , o I d _ i d [s ] ) to owner (src(e) ) end 

16 :: recv set_old(e,id) => dst_old[e]:=id; c h ec k _rea d y (src(e)) 

17 take s from sig_queue => 

18 sig := co m p u te_sig ( s ) ; 

19 for e in pred(s) with lbl(e) = T do 

20 send set_sig (e , sig ) to owner ( s rc ( e ) ) end 

21 send ge t _g I o b a I (s , o I d _ i d [ s ] , s i g ) to owner ( o I d _ i d [ s ] , s i g ) 

22 :: recv set_sig(e,e_sig) => dest_sig[e] := e_sig; c h ec k _rea d y (src(e)) 

23 :: recv get_global(s,id_old,sig) => 

24 send se t _g I o b a I ( s , i n d exe d _set _ p u t ( i d _o I d , sig)) to owner(s) 

25 recv s e t _g I o b a I ( s , i d ) => c u r r e n t _ i d [ s ] : = i d ; add s to new_queue 

26 take s from new_queue => 

27 for e in pred(s) with lbl(e)eAct > do 

28 send set _n e w (e , c u r r e n t _ i d [s ] ) to owner(src(e)) end 

29 recv(set_new(e,id)) => dst_new[e]:=id; c h ec k_rea d y (src(e)) 

30 untilVseS:current_id[s]7^_L 

31 new_count:=distributed_sum(index_count) 

32 end 



33 end 



Each worker stores both ingoing and outgoing edges of the states it owns in a way that allows it to quickly 
enumerate the successors and predecessors of every state. 

Next, we will explain our algorithm for distributed computation of inductive signatures. Pseudo code 
of the main loop can be found in Table [T] It leaves out the details of the signature computation and global 
hash table. These details can be found in table[2] The algorithm works in a few steps: 

1. Put the initial partition (every state is equivalent) in the current partition and start the first iteration. 
(See table [T] lines |4]|5j) 

2. Initialize the data structure needed in each iteration. That is, set the values of the successor partition 
IDs and signatures to undefined, clear the global hash table, clear the signature and new ID queues 
and put all states in the old ID queue. (See table[T] lines[7 11 ) 



3. If a state is in the old ID queue it means that the ID with respect to the previous partition has to 
be forwarded to the predecessors. This is done by sending a message for every incoming A? or 



S.C.C. Blom, J.C. van de Pol 



41 



Table 2: Subroutines for inductive branching minimization. 



1 


proc check_ready(s) 






2 




for e in succ(s) do 






3 




if dest_id [e]=_L 


or lbl(e) = T A dest_sig[<? 


]=_L then return end 


4 




end 






5 




add s to sig_queue 






6 


end 








7 


set 


com pute_sig (s) 






8 




pre := 






9 




for e in succ(s)C\El 


do pre :— pre U {(lbl(e) 


dst.old [e])} end 


10 




for e in s«cc(s)n£'^' 


do pre := pre U {(lbl(e) 


, dst_new [e] ) } end 


11 




for e in succ(s) with lbl(e) ~ T and dest_id [s 


= dst.old [e] do 


12 




if pre C dest 


s i g [e]U{(r,dst_new[e])} 


then return dest sig[e 


13 




end 






14 




return pre 






15 


end 








16 


int 


index_count : = ; 


hashtable i n d ex_ta b 1 e : = 





17 


proc i n d exe d _s e t _c 1 e a r 


() i n d ex_co u n t : = ; i n d ex_ta b 1 e :=0 end 


18 


int 


indexed_set_put ( pair ) 




19 




if index _table [pair 


] = _L then 




20 




index _t able [ pair 


] : = index_count*workersH 


-me; i n d ex_co u n t++ end 


21 




return index_table[ 


pair] 




22 


end 









end 



T edge. (See table [T] lines 13 15 ) If such a message is received then the old ID is stored and if 
necessary the state is put in the signature queue. (See table[T] line 16 ). 

4. If a state is in the signature queue then all information needed to compute the signature is present. 
Once the signature has been computed it is sent to all t predecessors and a request is sent to the 
global hash table to resolve the ID of the (oldID, signature) pair. (See table [T] lines 17 21 ) If 
a signature set request is received then the signature is set and if necessary the state is put in the 
signature queue. (See table [T] line 22 ) If a hash table request is received then the lookup is made 
and the reply is sent immediately. (See table [TJ lines 23 24 ) Upon receiving the reply, the state is 
put in the new ID queue. (See table [T] line [25]) 

5. If a state is in the new ID queue then the ID in the current partition is ready to be sent to all A> 
predecessors. (See table [T] lines 26 28 ) Receiving such a message leads to storing the result and 
possibly inserting the state in the signature queue. (See table [T] line 29 ) 

6. As soon as the new partition ID of every state is known everywhere, the message loop can exit. 
Note that this requires a simple form of distributed termination detection. 

7. By adding up the share of every partition ID hash table, we compute the number of partitions and 
we repeat the loop if necessary. 

As described above, messages from the old queue, signature queue and new queue are dealt with in 
parallel until finished. The actual implementation deals with these messages in waves: first the entire old 
queue is dealt with then the signature queue and new queue are emptied globally in sub iterations. 

Before we discuss the experiments with our prototype implementation, we first discuss the time, 
memory and message complexity. For this analysis we assume that the fan out of every state is bounded. 
We assume an LTS with /V states and M transitions. 



42 



Inductive Branching Signatures 



The time needed for the algorithm is the number of iterations times the cost of each iteration. The 
worst case number of iterations is the number of states N. (E.g. fortheLTS ({0, •• • ,N— 1}, i i+ 1 mod 
NUO 0,0).) In each iteration, for each state we must compute the signature and insert it in the global 
hash table. Due to the fact that the fan out is constant, this requires (N) time and messages. For each 
edge, we may have to send the old ID, the new ID and the signature. This requires &{M) time and 
messages. Overall, the worst case time complexity is 0(N • N + M) . 

The number of times one cannot avoid waiting for a message in each iteration depends on the length 
of the longest A> path in the graph: computation has to start at the last node and work up to the first, 
incurring three message latencies at each step. 

The memory needed by the algorithm to store the LTS and the signatures is linear in the number of 
states and transitions: ff{N + M). (This is a difference to the old algorithm where even if the fan out 
was bounded, the size of many signatures could be in the order of the number of edges.) Provided that 
the owner functions work well, the memory use is evenly distributed across all workers. The memory 
needed for message buffering can be kept constant, because each step that involves sending more than 
one message is a step where a state has to be taken from a queue. Blocking these steps if the number of 
messages in the system is above a threshold limits the number of messages to that threshold. Overall, the 
worst case memory complexity of the algorithm is ff(N + M). 

The worst case memory is also the expected memory complexity, since we expect to keep the LTS 
in memory. The expected time complexity is much lower than the worst case: The expected number of 
iterations and the expected length of the longest A> path are orders of magnitude less than the number 
of states. 

4 Experimental Evaluation 

To study the performance of the implementation of the new algorithm, we use four models. We perform 
two tests on these models. First, we compare with existing branching bisimulation reduction tools. 
Second, we test how well the new implementation scales in the number of computes nodes and cores 
used per node. In addition, we briefly mention work in progress on inductive strong bisimulation. 

The models that we use in our experiments are: 

lift6 A distributed lift system lTT4l . This model describes a system that can lift large vehicles by using 
one leg for each wheel of the vehicle. These legs are connected in a ring topology. The instance 
we used has 6 legs. 

swp6 A version of the sliding window protocol |QQ. It nas 2 data elements, the channels can contain at 

most one element and the window size is 6. 
fr53 A model of Franklin's leader election protocol for anonymous processes along a bidirectional ring 

of asynchronous channels, which terminates with probability one EQTI. We chose an instance 

with 5 nodes and 3 identities. 
1394fin Model of the physical layer service of the 1394 or firewire protocol and also the link layer 

protocol entities lfT7ll20l . We use an instance with 3 links and 1 data element. 

The sizes of these models, in their original, cycle eliminated and branching reduced forms are shown 
in Table [3] This table also show the number of iterations needed by classic branching (c.b.), inductive 
branching (i.b.), classic strong (as.), inductive strong (i.s.) and the length of the longest z path (p). Note 
that in two cases (lift6 and 1394nn) the number of iterations needed by the inductive branching algorithm 
is less than the number needed by the classical algorithm. Also note that the number of iterations needed 
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Table 3: Problem sizes 





orij 


;inal 


cycle free 


branching 


iterations 




states 


trans. 


states 


trans. 


states 


trans. 


c.b. 


i.b. 


c.s. 


i.s. 


P 


lift6 


33,949,609 


165,318,222 


33,946,699 


165,312,102 


12,463 


71,466 


16 


8 


91 


7 


78 


swp6 


56,793,060 


271,366,320 


13,606,212 


56,996,856 


8,191 


16,380 


13 


13 


20 


13 


51 


1394fin 


88,221,818 


152,948,696 


86,692,394 


148,537,294 


26,264 


79,002 


7 


5 


91 


6 


75 


fr53 


84,381,157 


401,681,445 


81,115,587 


385,379,715 


2 


1 


2 


2 






196 



for inductive strong bisimulation is always a lot less. It will be interesting to see, if we get similar results 
if we use real input graphs and A>, instead of T-cycle reduced graphs and A> = {t}. 
In Table |4| we show the results of the comparison. The tools in the comparison are 

beg min The reduction tool from the CADP toolset [12]. Version 1.7 from the 2007q beta release, 64 
bit installation. This implements the algorithm from |[T5ll . for which first the T-cycles must be 
eliminated (ce). 

ltsmin sequential The reduction tool which is released as part of the jiiCRL toolset Q. We additionally 
implemented a sequential version of the inductive branching bisimulation algorithm in this tool. 

ltsmin distributed A distributed implementation, which contains the classic distributed branching bisim- 
ulation reduction algorithm from [ 8 ] , and the newly implemented inductive branching bisimulation 
reduction algorithm. 

For beg min, we show the total time needed for reading the input, reducing and writing the output. 
For ltsmin sequential, we show both the total time and the time needed for reduction. For ltsmin dis- 
tributed classic, we show the reduction time (wall clock time). For ltsmin distributed inductive, we show 
the time for sequential cycle elimination and the wall clock time of distributed reduction. In all cases we 
additionally show the total memory requirements in MB. The tests were performed on a dual quad core 
Xeon 3GHz machine with 48GB memory. 

Several conclusions can be drawn from the results. By looking at the results for sequential ltsmin, we 
can conclude that inductive signatures are better than classic signatures. By looking at the times needed 
for fr53 it is obvious that this implementation of cycle elimination in ltsmin should be improved. 

We can also conclude that on these cases, sequential ltsmin uses much less memory than beg min for 
branching bisimulation. With the exception of fr53, sequential ltsmin is also much faster than beg min. 
Note that the differences in time/memory are partially due to differences in implementation. For instance, 
begmin uses 64 bit pointers to represent partitions, whereas ltsmin uses 32 bit integers. 

It is also clear that the distributed tool is much more expensive in time and memory than the sequential 
tool. The extra cost in memory is easily explained. In ltsmin, signature ID's are stored per state only. 
In ltsmin they have to be stored per state and per transition. In ltsmin the LTS itself takes 4 bytes per 
state and 8 bytes per transition (label and state). In ltsmin it takes 8 bytes per state and 24 bytes per 
transition (label, owner and state for ingoing and outgoing edges). This mean that ltsmin has to work 
through roughly 3 times as much data in each iteration, which might take up to 3 times as much time. 
Frequent synchronization between the workers and having to send and receive information that in ltsmin 
can simply be accessed is expected to account for a lot of time. 

To test how well the algorithms scale, we first eliminated the % cycles from the four examples and 
then ran the inductive reduction on 1, 2, 4 and 8 nodes with 1, 2 ,4 and 8 cores per node. For these 
tests, we used a cluster with dual quad core Xeon 2GHz, 8GB memory machines connected with gigabit 
ethernet. The times needed for the reduction can be seen in Fig. [T] 
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Table 4: Sequential tool comparison. 
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ce + inductive 
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time 
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time 
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mem 


time 


red 
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time 


red 


mem 
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mem 


red 


mem 


Iift6 


1251 


6493 


261 


225 


2939 


298 


261 


2203 


191 


154 


2299 


655 


7116 


64+246 


5520 


swp6 


1298 


10699 


342 


287 


5464 


264 


209 


3625 


166 


111 


3573 


621 


12129 


73+133 


3587 


1394 


20906 


8226 


248 


218 


3473 


231 


201 


2482 


144 


114 


2724 


730 


8657 


62+272 


6315 


fr53 


204 


15870 


305 


237 


9744 


1247 


1180 


5377 


715 


651 


5462 


188 


16871 


624+476 


12991 



The graphs have been ordered from the smallest to the largest problem. It is interesting to see that for 
the smallest problem (swp6), the first time that more workers leads to more rather than less time is using 
2 nodes, 2 cores per node. For the next two (lift6,1394fin) this happens at 2 nodes, 4 cores per node and 
for the largest (franklin) at 4 nodes, 4 cores per node. 

It is also clear that using 8 cores instead of 4 is problematic. For 1 and 2 nodes the performance 
increase is small and for 4 and 8 nodes, the performance actually gets worse. Taken together with 
the huge difference in performance between the sequential and the distributed tool this leads to the 
(unsuiprising) conclusion that it would be better to change the implementation to be aware of which 
workers are local (allow shared memory) and which workers are remote (require message passing). We 
leave such a tuned heterogeneous cluster-of-multi-cores implementation for future work. 



5 Conclusion 

We have defined the notion of inductive branching signature and proven that it corresponds to branching 
bisimulation. We have given a distributed algorithm that computes the coarsest branching bisimulation 
using inductive signatures. In the experiments section, we have shown that it is possible to implement 
the algorithm in such a way that it scales for up to 8 workers with 1 or 2 cores. 

The current prototype is good enough to show the merit of the concept of inductive signatures. How- 
ever, it can be optimized in several ways. For example, the information about edges between two workers 
is currently stored by both the source worker and the destination worker. If both workers are on the same 
machine, then they could share a single instance of the data. Similarly, the algorithm uses a lot of small 
messages. For good performance, message combining is needed, which is currently done at the worker 
level, but could be done at the node level instead. 

Because strong bisimulation is a special case of branching bisimulation, our algorithm can also be 
used for strong bisimulation. However, for branching bisimulation we can eliminate X cycles to get 
a well-founded partition. For strong bisimulation, we will have to come up with a good heuristic to 
automatically find well-founded partitions. 

As a final conclusion, we note that inductive signatures for branching bisimulation improve time and 
memory requirements compared to classical signatures, both in a sequential and a distributed implemen- 
tation. Of course, distributed minimization can handle larger graphs that don't fit in the memory of a 
single machine. Additionally, the distributed version using 8 cores on 2 nodes consistently beats the best 
sequential algorithm in time. 
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Figure 1 : Distributed reduction times for inductive branching bisimulation 
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